skip to main content


Search for: All records

Creators/Authors contains: "Rokas, Antonis"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Introduction

    Eukaryotic life depends on the functional elements encoded by both the nuclear genome and organellar genomes, such as those contained within the mitochondria. The content, size, and structure of the mitochondrial genome varies across organisms with potentially large implications for phenotypic variance and resulting evolutionary trajectories. Among yeasts in the subphylum Saccharomycotina, extensive differences have been observed in various species relative to the model yeastSaccharomyces cerevisiae, but mitochondrial genome sampling across many groups has been scarce, even as hundreds of nuclear genomes have become available.

    Methods

    By extracting mitochondrial assemblies from existing short-read genome sequence datasets, we have greatly expanded both the number of available genomes and the coverage across sparsely sampled clades.

    Results

    Comparison of 353 yeast mitochondrial genomes revealed that, while size and GC content were fairly consistent across species, those in the generaMetschnikowiaandSaccharomycestrended larger, while several species in the order Saccharomycetales, which includesS. cerevisiae, exhibited lower GC content. Extreme examples for both size and GC content were scattered throughout the subphylum. All mitochondrial genomes shared a core set of protein-coding genes for Complexes III, IV, and V, but they varied in the presence or absence of mitochondrially-encoded canonical Complex I genes. We traced the loss of Complex I genes to a major event in the ancestor of the orders Saccharomycetales and Saccharomycodales, but we also observed several independent losses in the orders Phaffomycetales, Pichiales, and Dipodascales. In contrast to prior hypotheses based on smaller-scale datasets, comparison of evolutionary rates in protein-coding genes showed no bias towards elevated rates among aerobically fermenting (Crabtree/Warburg-positive) yeasts. Mitochondrial introns were widely distributed, but they were highly enriched in some groups. The majority of mitochondrial introns were poorly conserved within groups, but several were shared within groups, between groups, and even across taxonomic orders, which is consistent with horizontal gene transfer, likely involving homing endonucleases acting as selfish elements.

    Discussion

    As the number of available fungal nuclear genomes continues to expand, the methods described here to retrieve mitochondrial genome sequences from these datasets will prove invaluable to ensuring that studies of fungal mitochondrial genomes keep pace with their nuclear counterparts.

     
    more » « less
    Free, publicly-accessible full text available November 23, 2024
  2. Abstract

    The ∼1 200 known species in subphylum Saccharomycotina are a highly diverse clade of unicellular fungi. During its lifecycle, a typical yeast exhibits multiple cell types with various morphologies; these morphologies vary across Saccharomycotina species. Here, we synthesize the evolutionary dimensions of variation in cellular morphology of yeasts across the subphylum, focusing on variation in cell shape, cell size, type of budding, and filament production. Examination of 332 representative species across the subphylum revealed that the most common budding cell shapes are ovoid, spherical, and ellipsoidal, and that their average length and width is 5.6 µm and 3.6 µm, respectively. 58.4% of yeast species examined can produce filamentous cells, and 87.3% of species reproduce asexually by multilateral budding, which does not require utilization of cell polarity for mitosis. Interestingly, ∼1.8% of species examined have not been observed to produce budding cells, but rather only produce filaments of septate hyphae and/or pseudohyphae. 76.9% of yeast species examined have sexual cycle descriptions, with most producing one to four ascospores that are most commonly hat-shaped (37.4%). Systematic description of yeast cellular morphological diversity and reconstruction of its evolution promises to enrich our understanding of the evolutionary cell biology of this major fungal lineage.

     
    more » « less
  3. Abstract

    PhyloFisher is a software package written primarily in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of protein sequences from eukaryotic organisms. Unlike many existing phylogenomic pipelines, PhyloFisher comes with a manually curated database of 240 protein‐coding genes, a subset of a previous phylogenetic dataset sampled from 304 eukaryotic taxa. The software package can also utilize a user‐created database of eukaryotic proteins, which may be more appropriate for shallow evolutionary questions. PhyloFisher is also equipped with a set of utilities to aid in running routine analyses, such as the prediction of alternative genetic codes, removal of genes and/or taxa based on occupancy/completeness of the dataset, testing for amino acid compositional heterogeneity among sequences, removal of heterotachious and/or fast‐evolving sites, removal of fast‐evolving taxa, supermatrix creation from randomly resampled genes, and supermatrix creation from nucleotide sequences. © 2024 Wiley Periodicals LLC.

    Basic Protocol 1: Constructing a phylogenomic dataset

    Basic Protocol 2: Performing phylogenomic analyses

    Support Protocol 1: Installing PhyloFisher

    Support Protocol 2: Creating a custom phylogenomic database

     
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  4. Free, publicly-accessible full text available August 1, 2024
  5. Townsend, Jeffrey (Ed.)
    Abstract Xylose is the second most abundant monomeric sugar in plant biomass. Consequently, xylose catabolism is an ecologically important trait for saprotrophic organisms, as well as a fundamentally important trait for industries that hope to convert plant mass to renewable fuels and other bioproducts using microbial metabolism. Although common across fungi, xylose catabolism is rare within Saccharomycotina, the subphylum that contains most industrially relevant fermentative yeast species. The genomes of several yeasts unable to consume xylose have been previously reported to contain the full set of genes in the XYL pathway, suggesting the absence of a gene–trait correlation for xylose metabolism. Here, we measured growth on xylose and systematically identified XYL pathway orthologs across the genomes of 332 budding yeast species. Although the XYL pathway coevolved with xylose metabolism, we found that pathway presence only predicted xylose catabolism about half of the time, demonstrating that a complete XYL pathway is necessary, but not sufficient, for xylose catabolism. We also found that XYL1 copy number was positively correlated, after phylogenetic correction, with xylose utilization. We then quantified codon usage bias of XYL genes and found that XYL3 codon optimization was significantly higher, after phylogenetic correction, in species able to consume xylose. Finally, we showed that codon optimization of XYL2 was positively correlated, after phylogenetic correction, with growth rates in xylose medium. We conclude that gene content alone is a weak predictor of xylose metabolism and that using codon optimization enhances the prediction of xylose metabolism from yeast genome sequence data. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  6. Stajich, Jason E. (Ed.)
    ABSTRACT Insect-associated fungi play an important role in wild and agricultural communities. We present a draft genome sequence of an entomopathogenic strain from the fungal genus Aspergillus , isolated from a honey bee pupa. 
    more » « less
  7. Abstract Summary

    GSEL is a computational framework for calculating the enrichment of signatures of diverse evolutionary forces in a set of genomic regions. GSEL can flexibly integrate any sequence-based evolutionary metric and analyze sets of human genomic regions identified by genome-wide assays (e.g. GWAS, eQTL, *-seq). The core of GSEL’s approach is the generation of empirical null distributions tailored to the allele frequency and linkage disequilibrium structure of the regions of interest. We illustrate the application of GSEL to variants identified from a GWAS of body mass index, a highly polygenic trait.

    Availability and implementation

    GSEL is implemented as a fast, flexible and user-friendly python package. It is available with demonstration data at https://github.com/abraham-abin13/gsel_vec.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  8. Abstract

    Telomere healing occurs when telomerase, normally restricted to chromosome ends, acts upon a double-strand break to create a new, functional telomere. De novo telomere addition (dnTA) on the centromere-proximal side of a break truncates the chromosome but, by blocking resection, may allow the cell to survive an otherwise lethal event. We previously identified several sequences in the baker's yeast, Saccharomyces cerevisiae, that act as hotspots of dnTA [termed Sites of Repair-associated Telomere Addition (SiRTAs)], but the distribution and functional relevance of SiRTAs is unclear. Here, we describe a high-throughput sequencing method to measure the frequency and location of telomere addition within sequences of interest. Combining this methodology with a computational algorithm that identifies SiRTA sequence motifs, we generate the first comprehensive map of telomere-addition hotspots in yeast. Putative SiRTAs are strongly enriched in subtelomeric regions where they may facilitate formation of a new telomere following catastrophic telomere loss. In contrast, outside of subtelomeres, the distribution and orientation of SiRTAs appears random. Since truncating the chromosome at most SiRTAs would be lethal, this observation argues against selection for these sequences as sites of telomere addition per se. We find, however, that sequences predicted to function as SiRTAs are significantly more prevalent across the genome than expected by chance. Sequences identified by the algorithm bind the telomeric protein Cdc13, raising the possibility that association of Cdc13 with single-stranded regions generated during the response to DNA damage may facilitate DNA repair more generally.

     
    more » « less
  9. Hejnol, Andreas (Ed.)
    Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species—a phenomenon observed among several important families of genes such as transporters and transcription factors—are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a s plitti n g a nd p runing procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life. 
    more » « less